- Regression trees
- Classification trees
- Bagging and Random Forests
- Boosting trees
- Variable Importance measures
11/10/2020
,
Bootstrap aggregation, or bagging, is a general-purpose procedure for reducing the variance of a statistical learning method. We introduce it here because it is particularly useful and frequently used in the context of decision trees.
Recall that given a set of \(n\) independent observations \(X_1, \dots, X_n\), each with variance \(\sigma^2\), the variance of the mean \(\overline{X}\) of the observations is given by \(\frac{\sigma^2}{n}\). In other words, averaging a set of observations reduces variance
Now imagine that these “observations” are the predictions given by a statistical methods. Of course, this is not practical because we generally do not have access to multiple training sets to average over
However, we can bootstrap by taking repeated samples from the (single) training data set.
To Bootsrap Aggregate (Bag) we:
We “randomize” our data and then build a lot of big, and hence noisy, trees:
Suggests you just need a couple of hundred trees!
Note: Bagging is Random Forests with \(m = p\).
Like bagging, boosting is a general approach that can be applied to many statistical learning methods for regression or classification. Boosting uses many trees, too. However, the trees are grown sequentially: each tree is grown using information from previously grown trees.
We have to choose:
Boosting for categorical \(y\) works in an analogous manner but it is more complicated in how you define what is “left over”.
## [1] 571
Ensemble methods can give dramatically better fits than simple trees.
By computing summary measures, you can get some sense of how the trees work. In particular, we are often interested in which variables in \(x\) are really the “important” ones.
## var rel.inf ## lstat lstat 32.9769020 ## rm rm 31.7265743 ## dis dis 9.6760567 ## nox nox 5.1820613 ## crim crim 4.9207831 ## black black 3.8421932 ## ptratio ptratio 3.7491902 ## age age 3.1540710 ## tax tax 1.6917150 ## chas chas 1.0245005 ## rad rad 1.0180367 ## indus indus 0.9034953 ## zn zn 0.1344207